Ensemble Correlation Coefficient
نویسندگان
چکیده
Elements in a sample date are demonstrated based on their characteristics and in turn the characteristics are represented by variables. Identifying the relationship between these variables is crucial for prediction, hypothesis testing, and decision making. The relation between two variables is often quantified using a correlation factor. Once correlation is known it can be used to make predictions. It means when two variables are highly correlated, and if we have observed one variable, we can make a prediction about the other variable. A more accurate prediction will be made where there is strong relationship between variables. Among several correlation factors, Pearson correlation Coefficient has been commonly used. Distance correlation and maximal information coefficient have been introduced recently to address the shortcomings of Pearson correlation coefficient. In this paper, we compare these factors through a set of simulations and combine them to introduce a more robust factor that can be generally used.
منابع مشابه
Application of ensemble learning techniques to model the atmospheric concentration of SO2
In view of pollution prediction modeling, the study adopts homogenous (random forest, bagging, and additive regression) and heterogeneous (voting) ensemble classifiers to predict the atmospheric concentration of Sulphur dioxide. For model validation, results were compared against widely known single base classifiers such as support vector machine, multilayer perceptron, linear regression and re...
متن کاملCombining Bagging, Boosting and Random Subspace Ensembles for Regression Problems
Bagging, boosting and random subspace methods are well known re-sampling ensemble methods that generate and combine a diversity of learners using the same learning algorithm for the base-regressor. In this work, we built an ensemble of bagging, boosting and random subspace methods ensembles with 8 sub-regressors in each one and then an averaging methodology is used for the final prediction. We ...
متن کاملEnsemble of M5 Model Tree Based Modelling of Sodium Adsorption Ratio
This work reports the results of four ensemble approaches with the M5 model tree as the base regression model to anticipate Sodium Adsorption Ratio (SAR). Ensemble methods that combine the output of multiple regression models have been found to be more accurate than any of the individual models making up the ensemble. In this study additive boosting, bagging, rotation forest and random subspace...
متن کاملDiversity in
Diversity in ensembles is the key to improved accuracy. Six pair wise diversity measures have been studied and Plain Disagreement and Kappa coefficient are being recommended for ensemble construction as they exhibit high correlation with error reduction. We have also verified that correlation between diversity and error reduction increases with an increase in the ensemble size. This study also ...
متن کاملسودمندی رگرسیونهای تجمیعی و روشهای انتخاب متغیرهای پیشبین بهینه در پیشبینی بازده سهام
مقاله حاضر به بررسی سودمندی رگرسیونهای تجمیعی و روشهای انتخاب متغیرهای پیشبین بهینه (شامل روش مبتنی بر همبستگی و ریلیف) برای پیشبینی بازده سهام شرکتهای پذیرفته شده در بورس اوراق بهادار تهران میپردازد. بهمنظور ارزیابی عملکرد رگرسیون تجمیعی، معیارهای ارزیابی (شامل میانگین قدرمطلق درصد خطا، مجذور مربع میانگین خطا و ضریب تعیین) مربوط به پیشبینی این روش، با رگرسیون خطی و شبکههای عصبی مصنوعی...
متن کاملMProfiler: A Profile-Based Method for DNA Motif Discovery
Motif Finding is one of the important tasks in gene regulation which is essential in understanding biological cell functions. Based on Tompa et al. study, the performance of current motif finders is not satisfactory. A number of ensemble methods has been proposed to enhance the results. Existing ensemble methods overall performance is better than stand-alone motif finders. A recent ensemble met...
متن کامل